From: "World Chess Championship", INTERNET:newsletter@mark-weeks.com Date: 01/02/01, 16:17 Re: Chess History on the Web (2001 no.3) Site review - UPITT (V; Events) In 'Site review - UPITT (III)' [Chess History on the Web (2000 no.24)] we looked at UPITT's (the University of Pittsburgh) PGN (Portable Game Notation) archive at... http://www.pitt.edu/~schach/ ...In that review I described the structure of the UPITT archive, ran some simple tests on the internal consistency of the data, and examined the game collections covering single players. Except for a few statistics, I said little about UPITT's collection of chess events. In the directory PGN/Events I found 1275 files totaling 34.0 megabytes in size, and I counted the files in the different PGN/Events subdirectories to see how they are distributed across the years in which the events were played:- 1991 384 (0-1991) 1992 83 1993 51 1994 131 1995 219 1996 180 1997 90 1998 95 1999 34 2000 8 This shows that 384 pre-1992 events are covered by files in UPITT's archive. In this current review I'd like to go back and take a closer look at the events collection. To get started I wanted to use the same database as for the previous review, which was built from data captured around the beginning of December. What's been added since then? The directory chess/Newstuff lists NEW1118.TXT as its newest file. NEW1118.TXT references three PGN files, covering these events:- 62BCF-PG.ZIP 1962 British Championship 80HAS-PG.ZIP 1920 Hastings 84BUG-PG.ZIP 1984 Bugojno BIH (Cat. 14) It also mentions that these last two files are from James Tan. A few years ago Tan sent me games that I was missing from a couple of Interzonals, so I'm familiar with his work. It's good work and I was pleased to see that UPITT data is generated by archivists who know what they're doing. Since I had mentioned in the previous review that the latest PGN files were date stamped 2000-11-18, I was certain that nothing had changed in the intervening six weeks. You may have noticed that the description for 80HAS-PG.ZIP says '1920 Hastings'. The participants are listed in the NEW1118.TXT file as 'Andersson, Torre, Lev Alburt et al.', so the year 1980 is correct. This error has not crept into the ALLINDEX.TXT, the UPITT file describing the content of the entire archive. I have two tables with information about the PGN/Events collection. The first lists 1279 entries taken from the ALLINDEX.TXT file; the second lists 1278 entries taken from FTP directories. Both tables have more than the 1275 files that I reported above. Where did my number 1275 come from? When I went back and looked at the calculations I made for total filesize, I discovered that I had eliminated three files because their filesize unit of measure ('bytes') was not the same as the other files in the archive ('kilobytes'). I downloaded these three small files and discovered that one of them contains only two games, while the other two are empty. I now stand corrected; the UPITT events archive contains 1278 files, of which two are empty. There is another discrepancy in my database. The number of events in ALLINDEX.TXT (1279 entries) doesn't match the number of files in the FTP directories (1278 entries). When I checked for entries which are in one table but not in the other, I found:- - 17 only in ALLINDEX.TXT - 16 only in the FTP directories Most of these were due to typos in the index file. For example, the file 07WIENPG is indexed as 02WIENPG ('Vienna 1907'); 35TAT-PG is indexed as 35BLEDPG ('1935 Tatatovaros'), etc. After I resolved these typos in the index file, I was left with two files in the FTP directories (98CROWPG & 98OLYMPG) which were not in the index file. The index file, in turn, listed three files (75LASPPG ['1975 Las Palmas'], 76LP-PG ['1976 Lone Pine'], and 79LP-PG ['1979 Lone Pine']) which do not exist in the FTP directory. In addition to the PGN collections, UPITT has collections for Chess Assistant (CA), ChessBase (CB), and NicBase (NB). These collections are largely similar; each file in PGN format has an equivalent in each of the other three formats. When I examined the player collections I found that the ChessBase format covered considerably more players than the PGN collection did. How do the event collections look? I compared the PGN collections for pre-1992 events against the equivalent collections for the other formats and got these counts:- Ct Format 386 PGN 385 CB 383 CA 183 NB Since the NicBase collection is clearly less comprehensive than the others, I excluded it from further consideration. The PGN collection lists four files (49USOPEN, 59USOPEN, 69USOPEN, & USSR73PG) which are neither in Chess Assistant nor in ChessBase. Chess Assistant and ChessBase both list one file (1895H) which is not in PGN, while ChessBase lists two files (1851LCBV & COMP1970) which are not in PGN. As a last test, I looked to see if there were errors in ALLINDEX.TXT like the one described above for 1920 Hastings, where the index description doesn't match the file name. I found only three. When I add up these little problems -- empty files, index errors, mismatches with other formats -- I find that about 2% of the PGN files have an associated administrative error. This seems to be a small percentage for an archive which is probably administered on a volunteer basis. --- A bigger question is, 'How complete is the UPITT collection'? To get an answer, I decided to take a closer look at Chess Records Management. CRM is found at address... http://users.imag.net/~lon.jpope/ ...The main page says, 'Welcome to CRM. This collection of historical tournaments might be the largest tournament collection of its kind in the world. These games are available through trade only. There are more than 6,000 tournaments, and more than 1.4 million games. The database has been weeded of "offhand" or "casual" games, computer games, low rated or unrated games, casual games, internet games, and various other games which have no practical value.' No games are available for download from the site, but lists of events covered by the collection are available. The main page ('last updated on December 15, 1999') links to 16 other pages, each listing a few hundred events. The events have been categorized as 'Round Robin and Knock-Out Tournaments', 'Matches', 'Team Tournaments', and 'Swiss System Tournaments'. Each of these four categories has been further split chronologically. Matches, for example, are covered by three pages -- 1834 to 1899, 1900 to 1959, and 1960 to 1998. The person behind all of this is John Pope. While looking through the list of events, I noticed that many of them were from Michigan -- 27 (by exact count) vs. 2 for California, and I wondered if John Pope is perhaps related to Nick (Jacques) Pope, who also hails from Michigan, and who runs both the excellent Chess Archaeology site at www.chessarch.com, and the Michigan Chess Association site. A Google search on 'john pope chess' produced many matches from different English speaking countries. One of the most interesting is... http://movie.epinions.com/mvie-review-737-110792-38BE0C02-prod7 ...a review of the film 'Searching for Bobby Fischer' which calls it the 'Best Chess Movie (are there any others?)'. The reference to John Pope says:- 'Another contrast is set up between the likeable Josh, and the obnoxious chess prodigy John Pope. This too is based on a real life character; only the name has been changed to protect the snotty kid. Chess players will recognize the type, even if they've forgotten the actual player -- it's the cocky kids who walk around other chess players and laugh at their moves. 'Actually the cocky "villain" of the story, John Pope, parallels the legendary Bobby Fischer, who appears in some fascinating black and white newsreel footage at various points in the story. Pope is very much like Fischer, since he has dropped out of school very young and has devoted his entire life to mastering the royal game -- to both Fischer and Pope chess is everything -- losing is unthinkable.' Another surprise from my Google search was a large number of hits for Pope John and Pope John Paul. On one of these pages I learned that FIDE, at its 1999 Diamond Jubilee celebration in Paris, awarded the title 'Grand Commander of the Legion of Grandmasters' to Pope John Paul II and to French president Jaques Chirac. Another page at... http://www.sonic.net/finearts/popelitl.html ...is a very nice painting by Keith Halonen, titled 'The Pope's little two-mover', a problem 'composed in 1946 by former Polish tank commando, Karol Wojtyla, nephew of famed Polish problem composer M. Wröbel. Wojtyla is known today as Pope John-Paul II'. Another page, written by the ubiquitous Bill Wall, at address... http://www.geocities.com/siliconvalley/lab/7378/religion.htm ...is called 'Religion and Chess' and, among many other facts, informs us that 'Religious leaders who have played chess include Thomas Becket (Archbishop of Cantebury), Charles Borromeo (Bishop of Milan), Pope Gregory VI, Pope Innocent III, Pope John Paul I, Pope John Paul II, Pope Leo X, Pope Leo XIII, Cardinal Richelieu, and Billy Graham.' The only page I found relevant to CRM was the October 1997 issue of Connecticut Chess Magazine at... http://www.geocities.com/robroy54/archives/ccm217.txt ...where an article announced the availability of 'Chess Records Management, A New Chess Service'. It continued:- 'Dear Chess Player, My name is John Pope. I am an MLS Librarian, and have been a chess enthusiast for many years. I have been building a top-quality chess database for the last 10 years, and now I am ready to offer public access. 'The service name is Chess Records Management. We intend to provide a chess database service to competitive players, collectors, and researchers who do not have the time or interest in developing their own databases, but still have specific chess information needs. [...] 'How It Works: This collection has been available for local chess buffs, and has been greatly appreciated. Now, it will be available to chess buffs worldwide. 'We can search the database by tournament, player or opening variation. You tell us what you want, and we tell you how many games we have available based on the criteria you give us. 'Then, you tell us how many games you want. As an introductory offer, games cost $5.00 for the first 100 games or fewer. After that it is $1 per 20 games. For example, 235 games will cost you $12.00. All shipments of games must be pre-paid and there are no refunds. [...] The Future Of CRM: 'If this service proves to be very popular, as we expect it should, we plan to have a commercial web site up and running soon. This will allow patrons to browse our tournament listings. We also have a "wish" list for historical tournaments we want, and we do accept selective trades. You can make the CRM Database YOUR database! Please pass this letter on to your chess friends, and let us know what you think of the CRM service idea! For more information e-mail jpope@wwdc.com 'Thanks, John Pope, CRM Librarian' I'm still in the dark about the Michigan connection. It may simply be that Pope has an active correspondent in Michigan. In any case, I knew considerably more about the history and objectives of the site. CRM's main page links to a glossary which describes codes used on the other pages for events. For example, 'ct' means candidate tournament, 'DRR' means double round robin, and 'jr' means a junior event. This additional information gives the site great historical value. Almost all events are listed with the year of the event, venue, purpose (as described by the codes in the glossary), and winner(s). The different event categories have more information specific to the category. Round robins, for example, are listed with the number of players and the FIDE category (for events after 1970); Swiss system tournaments are given with the number of rounds. I downloaded all of this data and loaded it into a database for further analysis. This let me have a count of the number of events and the number of games in each category:- Ct Type #games 653 M 5613 4000 RR 279205 1363 SS 256955 649 TT 171034 You can see from this that the CRM collection covers 653 matches (type 'M') for a total of 5613 games. The largest category, round robins ('RR'), covers exactly 4000 events for a total of 279.205 games. Another piece of CRM data let me calculate that in these same 4000 events, a total of 304.349 games were played. This means that CRM is missing 25.144 (about 8.3%) of the games played in these events. That's very impressive coverage. The total of 712.807 games falls far short of the '1.4 million games' mentioned on the CRM home page, and I don't know why I have such a big difference. The CRM events are distributed by decade as shown in the first column of the following table; the third column shows the number of UPITT events in the same decade:- Ct Decade 1 182x 6 183x 1 15 184x 6 25 185x 2 46 186x 1 21 187x 2 22 188x 8 58 189x 9 93 190x 16 77 191x 8 155 192x 30 214 193x 19 176 194x 14 307 195x 40 522 196x 35 601 197x 67 1185 198x 77 3141 199x 935 0 200x 8 The number of events has been steadily increasing decade over decade, except for drops in the 1910s and 1940s, undoubtedly due to the two world wars. There was also a drop from the 1860s through the 1870s/-80s. The first events in the CRM database are matches... Ct Year 1 1824 corr. Edinburgh vs. London (tt) [5 games; 1st TT] 6 1834 LaBourdonnais de,L - Macdonnell,A [6 matches] 2 1842 Hanstein,W-Jaenisch,C (m:5 games); Saint Amant,P - Schulten,J (m:2 games) 2 1843 Saint Amant,P - Staunton,H [2 matches] ...except the very first, which is classified as a team tournament ('TT'). The second event that was classified as a team tournament took place in 1896... 7 1896 Great Britain vs. USA (tt Telex); [no winner noted; 2nd TT] ...while the first events in the other categories were in 1849 and 1936... 3 1849 London (KO); 12 plyrs; Buckle,H [1st RR] 28 1936 Buenos Aires (ARG-ch); 7 rds; [no winner noted; 1st SS] ...Using the CRM data, I worked out a large number of original observations on historical chess events. Since I'm running out of space here, I'll share these in the next review. I'm going to wrap up this review by developing a few metrics to estimate how complete the UPITT and CRM data are. I chose the year 1985 as a sample because it was a watershed year in chess history. It was in 1985 that (1) Campomanes stopped the first Karpov - Kasparov world championship title match and that (2) Kasparov won the title in the second K-K match. These results set off a sequence of events which are still rocking the chess world today. CRM lists 112 events played in 1985. Of these, 10 are covered by the 9 events in UPITT for 1985; the Biel Interzonal, listed once in UPITT, is listed as two separate events in CRM -- once for the tournament and once for the playoff. Chess Informant numbers 39 & 40 also covered 1985, where each issue had two lists of events:- 1) events used to calculate new FIDE ratings (Inf.39 : 321 events & Inf.40 : 372 events) 2) events considered important enough to list the result (Inf.39 : 101 events & Inf.40 : 112 events) This means that there were 693 FIDE rated events in 1985, of which 213 were summarized by Informant. CRM covers about half of these events. UPITT doesn't even come close, but it more than makes up for this by offering game scores. How many other events were played throughout the world is anybody's guess. If you have an idea how to estimate this, send me a note. Bye for now, Mark Weeks